Measuring performance variability of EC2

Henrik Ingo (cc by) v1

Agenda

MongoDB performance testing
Witchcraft
Method for testing noise in our EC2 clusters
I/O tests
CPU tests
Network tests
Canary workloads
Summary

MongoDB performance testing

100+ Projects
1500+ Hosts
100+ Build Variants
400k hours/month
Performance = 5% of that
(more in $$$)

github.com/evergreen-ci

Microbenchmarks

System performance test
EC2 (this talk)

c3.8xlarge, SSD

The goal

Repeatable results

(NOT max performance)

Assumption	True / False
Dedicated instance = more stable performance	Not tested
Placement groups minimize network latency & variance	Not tested
Different availability zones have different hardware	Seems False
For write heavy tests, noise comes from disk	False
Ephemeral (SSD) disks have least variance	False
There are good and bad EC2 instances	False
Just use i2 instances (better SSD)	False (True in theory)
You can't use cloud for performance testing	False

We tested many aspects of EC2 and our own system. To help you follow the presentation, I will reveal up front what were the assumptions made when the system was first built, and how the assumptions fared in our testing.

The rest of the presentation I will then share how we tested different EC2 configurations and came to these conclusions.

It's common to see engineers making design decision based on things they read on the internet. As you can see, our system included LOTS of them!! I call it witchcraft. Old wives tales, not based in science. The point of this presentation is that that is bad idea! There are no short cuts. Assume nothing. Measure everything.

Method for testing noise in our EC2 clusters

Tests produce 1 or more values as result.
- ops/sec
To measure noise in the system:
- Lock the mongod binary used
- Repeat each test 5 times
- Repeat that on 5 different EC2 clusters
- = 25 data points
In addition to MongoDB benchmarks, also test infrastructure components:
- fio = disk, home made cpu tests, iperf3 = network

What is noise?

noise = (max - min) / median

Goal is to minimize this single metric

There are good and bad EC2 instances

False

(min - median - max)
for each test & thread level

mmapv1 left, wiredTiger right

insert_vector, insert_ttl, index_build
highest; jtrue lowest

Ephemeral (SSD) disks have least variance	False
Remote EBS disks have unreliable performance	False (piops)
Just use i2 instances (better SSD)	False (True in theory)